-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Source Amazon Ads: fix fragile polling generator #8388
Conversation
(ReportGenerationFailure, ReportGenerationInProgress), | ||
max_time=REPORT_WAIT_TIMEOUT, | ||
) | ||
def _try_read_records(self, report_infos): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job refactoring this messy code. But you have changed logic so now it wouldn't wait for reports status update and expect it to be completed immediately? You misunderstood REPORT_WAIT_TIMEOUT parameter, this is not time required for downloading report, its maximum time that requires for generating async report. Please read comment that you have deleted :)
I agree on that if single report failed with 500 error it should be retried but in this case we should take into account report processing time for each report.
Wouldn't you mind if I pick up this PR and continue work for you. @sherifnada what do you think on priority of this ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avida it is most valuable to focus on the issues currently in the sprint (FB/sentry), is it possible to describe the needed changes so monai can do them instead? alternatively hand off to another member on the team or come back to the PR after FB/sentry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you have changed logic so now it wouldn't wait for reports status update and expect it to be completed immediately?
As I wrote in the PR description above, I refactored the polling generator as a decorated function. As a result, it polls report status for up to REPORT_WAIT_TIMEOUT
, i.e., 20 minutes (see the documentation for max_time
property).
You misunderstood REPORT_WAIT_TIMEOUT parameter, this is not time required for downloading report, its maximum time that requires for generating async report.
False; see the comment above.
I agree on that if single report failed with 500 error it should be retried but in this case we should take into account report processing time for each report.
As already does yours, my implementation uses the same timeout for an entire report batch stored in report_infos
. Report generation is being initiated simultaneously for the batch and, therefore, should be completed in the same 20 minutes window. Even if they're generated not in the same order as initiated, the polling function takes and processes whichever is generated first.
Wouldn't you mind if I pick up this PR and continue work for you. @sherifnada what do you think on priority of this ticket?
Of course not, that would be great 😊 I started this PR because my original issue #6767 hasn't got any attention yet.
To add more context to this issue. Amazon Ads connector is virtually unusable with a real account with several actively running campaigns. Report extraction fails due to one or another error. Roughly 1 in 20, if not less, sync succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monai Sorry for long time no response, lost this conversation. Thanks for detailed answer, it makes sense, lets merge it.
Few more changes:
After these additional changes, I've finally managed to get the first successful sync for a whole month's reports. So this is the first version that works with an actual active account. Sync status:
|
@marcosmarxm this looks good to me. I have tried with the credentials. Do we need to inform the Connector team to evaluate experience ? |
@harshithmullapudi if connector team approves the PR go ahead |
@avida can you possibly take a quick look at this PR ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I was confused by nonstandard backoff decorator using for retries but after @monai explained it became clear to me. Thanks!
What
Amazon Ads connector initiates report generation and then polls for successfully generated reports. Currently, this logic is implemented as a single long-running and deeply nested generator that fails on every, possibly recoverable, upstream error.
This change is an attempt to fix the issue. Fixes #6767.
How
The polling generator is refactored as a decorated function. The decorator repeatedly calls it on every known error for a maximum of 30 minutes. When the status check is constantly failing for 30 minutes, it initializes a new report and repeats reinitialization up to 5 times.
Recommended reading order
source_amazon_ads/streams/report_streams/report_streams.py
🚨 User Impact 🚨
None.
Pre-merge Checklist
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described here